<<<<<<< HEAD ======= >>>>>>> df27db8ba4f834a4a8c88892f46e89a3e37c3c36

About the project

Introduction to Open Data Science is a course organized by Doctoral school in the humanities and social sciences (HYMY),University of Helsinki. As it is mentioned in the course page “The name of this course refers to THREE BIG THEMES: 1) Open Data, 2) Open Science, and 3) Data Science”. The learning platform for this course is MOOCs (Massive Open Online Courses) of the University of Helsinki: http://mooc.helsinki.fi. More information about the course can be found in this link: https://courses.helsinki.fi/en/hymy005/120776718.


My github repository for IODS-project

GitHub


Tools and Methods for open and reproducible research

In the first week of the course, I have gone through the general instructions of the course and get familiarize my self with the learning tools, the course content and schedule. Though I had previously experience with R and RStudio, I have done all the exercises and instructions given in DataCamp: R Short and Sweet and refresh my R. I completed all R Short and Sweet exercise and statement of accomplishment published here. The other exerices, I have done in this first week is RStudio Exercise 1. I already had a GitHub account and to follow the exercises and instructions for this exercise, I forked the IODS-project repository from Tuomo Nieminen’s github. After forking, I clone the IODS-project repository to my local machine (my computer) using the command git clone. After cloned the respostry to my computer, Using RStudio, I edited the existing Rmarkdown file (chapter1.Rmd, chapter2.Rmd and index.Rmd) in the repository and also add new Rmarkdown file and save as in the file name README.Rmd. Next, I commit the changes what have been done in the local machine using the git command git commit -m and finally push to github using the command git push. I have also canged the theme of my course diary web page to Merlot theme. In this exercises, I refresh my git and Github knowledge and also familiarize with how I will use the GitHub for this course to share and publish my learning diary.


Regression and Model validation

Data wrangling

In this section, the data set given in this link has been preprocess for further/downstream analysis. After creating a folder named ‘data’ in my IODS-project repository, using Rstudio a new R script named ‘create_learning2014’ file created and write all the steps used in the data wrangling process and saved in the ‘data’ folder. All the steps I have done in the data wrangling preprocess can be find here and/or in the code shown below.

<<<<<<< HEAD
#Mealk Weldenegodguad
#date 14.11.2017

#install.packages("dplyr")

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# Read the full learning2014 data from link using the R "read.table" function and the separator is a tab ("\t") and the file includes a header i.e header=TRUE. 

# the data read to memory and put in the object Learn2014

Learn2014=read.table("http://www.helsinki.fi/~kvehkala/JYTmooc/JYTOPKYS3-data.txt", sep="\t", header=TRUE)

# Using head function checking the first six rows

#head(Learn2014)

#The dimensions of the data can be checked using the code below

dim(Learn2014)
## [1] 183  60
#str(Learn2014)#

# The structure 'data.frame':   183 obs. of  60 variables:


#The column Attitude in Learn2014 is a sum of 10 questions and to scale the Attitude to the mean we should dived the Attitude value by 10 and store in "attitude" column

#Attitude  Da + Db + Dc + Dd + De + Df + Dg + Dh + Di + Dj

Learn2014$attitude <- Learn2014$Attitude / 10


# To scale deep learning variables to the original scales, select columns related to deep learning and scaling by taking the mean and create column 'deep' 

# deep=D03 + D11 + D19 + D27 + D07 + D14 + D22 + D30 + D06 + D15 + D23 + D31

deep_questions <- c("D03", "D11", "D19", "D27", "D07", "D14", "D22", "D30","D06",  "D15", "D23", "D31")

deep_columns <- select(Learn2014, one_of(deep_questions))

Learn2014$deep <- rowMeans(deep_columns)


# To scale surface learning variables to the original scales, select columns related to surface learning and scaling by taking the mean and create column 'surf'. 

#surf= SU02 + SU10 + SU18 + SU26 + SU05 + SU13 + SU21 + SU29 + SU08 + SU16 + SU24 + SU32

surface_questions <- c("SU02","SU10","SU18","SU26", "SU05","SU13","SU21","SU29","SU08","SU16","SU24","SU32")

surface_columns <- select(Learn2014, one_of(surface_questions))
Learn2014$surf <- rowMeans(surface_columns)

# To scale strategic learning variables to the original scales, select columns related to strategic learning and scaling by taking the mean and create column 'stra'.

#stra=ST01 + ST09 + ST17 + ST25 + ST04 + ST12 + ST20 + ST28

strategic_questions <- c("ST01","ST09","ST17","ST25","ST04","ST12","ST20","ST28")

strategic_columns<- select(Learn2014, one_of(strategic_questions))
Learn2014$stra <- rowMeans(strategic_columns)


keep_columns <- c("gender","Age","attitude", "deep", "stra", "surf", "Points")

# select the 'keep_columns' to create a new dataset

Learn2014_7_column <- select(Learn2014,one_of(keep_columns))

# Exclude observations where the exam points variable is zero.

Learn2014_7_column_exclude_0 <- filter(Learn2014_7_column, Points !=0)

dim(Learn2014_7_column_exclude_0)
## [1] 166   7
# set the working diracroty using the R function "setwd"

##############################################################

setwd("/home/melak/Open_data/IODS-project/data/")

# write the Preprocessed data i.e 166 by 7 to the file name learning2014.txt. the file is tab delmimted file 

write.table(Learn2014_7_column_exclude_0 ,"/home/melak/Open_data/IODS-project/data/learning2014.txt", sep="\t")

#############################################################
======= >>>>>>> df27db8ba4f834a4a8c88892f46e89a3e37c3c36

Analysis

The data for this section is extracted from a survey conducted by Kimmo Vehkalahti on teaching and learning approaches. The research project made possible by the Academy of Sciences funding to Teachers’ Academy funding (2013-2015). The data was collected from 3.12.2014 - 10.1.2015.The surrvey includes 183 respondants and the questioner in toatal includes 60 variables/questions.

<<<<<<< HEAD
## 
## Attaching package: 'GGally'
## The following object is masked from 'package:dplyr':
## 
##     nasa
======= >>>>>>> df27db8ba4f834a4a8c88892f46e89a3e37c3c36
students2014=read.table("data/learning2014.txt", sep="\t", header=TRUE)
dim(students2014)
## [1] 166   7
str(students2014)
## 'data.frame':    166 obs. of  7 variables:
##  $ gender  : Factor w/ 2 levels "F","M": 1 2 1 2 2 1 2 1 2 1 ...
##  $ Age     : int  53 55 49 53 49 38 50 37 37 42 ...
##  $ attitude: num  3.7 3.1 2.5 3.5 3.7 3.8 3.5 2.9 3.8 2.1 ...
##  $ deep    : num  3.58 2.92 3.5 3.5 3.67 ...
##  $ stra    : num  3.38 2.75 3.62 3.12 3.62 ...
##  $ surf    : num  2.58 3.17 2.25 2.25 2.83 ...
##  $ Points  : int  25 12 24 10 22 21 21 31 24 26 ...
summary(students2014)
##  gender       Age           attitude          deep            stra      
##  F:110   Min.   :17.00   Min.   :1.400   Min.   :1.583   Min.   :1.250  
##  M: 56   1st Qu.:21.00   1st Qu.:2.600   1st Qu.:3.333   1st Qu.:2.625  
##          Median :22.00   Median :3.200   Median :3.667   Median :3.188  
##          Mean   :25.51   Mean   :3.143   Mean   :3.680   Mean   :3.121  
##          3rd Qu.:27.00   3rd Qu.:3.700   3rd Qu.:4.083   3rd Qu.:3.625  
##          Max.   :55.00   Max.   :5.000   Max.   :4.917   Max.   :5.000  
##       surf           Points     
##  Min.   :1.583   Min.   : 7.00  
##  1st Qu.:2.417   1st Qu.:19.00  
##  Median :2.833   Median :23.00  
##  Mean   :2.787   Mean   :22.72  
##  3rd Qu.:3.167   3rd Qu.:27.75  
##  Max.   :4.333   Max.   :33.00

After preprocess the raw data, the final data set have 7 column “gender, age, attitude, deep, stra, surf and points” and 166 indvidual as shown in the above run.

plot_students2014=ggpairs(students2014, mapping = aes(col = gender,alpha = 0.3), lower = list(combo = wrap("facethist", bins = 20)))

plot_students2014
<<<<<<< HEAD

The above plot shown there are 110 femal and 56 male respondants in the survey. Moreover the plot shown how the variables (Points, Age, attitude, deep, stra and Surf) are correleted. Based on the absolute correlation value the attitude and points is higly correleted while deep learning and ponts are least corrleted.

The explanatory variable chosen based on (absolute) correlation value and the top three explanatory variable chosen are attitude= 0.437, stra= 0.146 and surf= -0.144. Using this variables the model is build using the R function “lm”.

my_model1 <- lm(Points ~ attitude + stra + surf, data = students2014)
=======

The above plot shown there are 110 femal and 56 male respondants in the survey. Moreover the plot shown how the variables (Points, Age, attitude, deep, stra and Surf) are correleted. Based on the absolute correlation value the attitude and points is higly correleted while deep learning and ponts are least corrleted.

The explanatory variable chosen based on (absolute) correlation value and the top three explanatory variable chosen are attitude= 0.437, stra= 0.146 and surf= -0.144. Using this variables the model is build using the R function “lm”.

my_model1 <- lm(Points ~ attitude + stra + surf, data = students2014)
>>>>>>> df27db8ba4f834a4a8c88892f46e89a3e37c3c36

summary(my_model1)
## 
## Call:
## lm(formula = Points ~ attitude + stra + surf, data = students2014)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -17.1550  -3.4346   0.5156   3.6401  10.8952 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  11.0171     3.6837   2.991  0.00322 ** 
## attitude      3.3952     0.5741   5.913 1.93e-08 ***
## stra          0.8531     0.5416   1.575  0.11716    
## surf         -0.5861     0.8014  -0.731  0.46563    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.296 on 162 degrees of freedom
## Multiple R-squared:  0.2074, Adjusted R-squared:  0.1927 
## F-statistic: 14.13 on 3 and 162 DF,  p-value: 3.156e-08

The significance paramter from the above summary table is the intercept 0.00322 and attitude (1.93e-08) ***. Hence stra nad surf are not condisdered in the model below. using again “lm” function linear regrression model build between points and attitude.The model after removing insignificant variables is summarized below. With regard to multiple r-squared value, we saw that value changed from 0.1927 (in older model) to 0.1856 (newer model). However, F-Statistic (from 14.13 to 38.61) and p-value(3.156e-08 to 4.119e-09) have significantly improved. Thus, we can conclude that r-squared value may not always determine the quality of the model and the lower r-squared value might be due to outliers in the data.

<<<<<<< HEAD
my_model2 <- lm(Points ~ attitude , data = students2014)
=======
my_model2 <- lm(Points ~ attitude , data = students2014)
>>>>>>> df27db8ba4f834a4a8c88892f46e89a3e37c3c36
summary(my_model2)
## 
## Call:
## lm(formula = Points ~ attitude, data = students2014)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -16.9763  -3.2119   0.4339   4.1534  10.6645 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  11.6372     1.8303   6.358 1.95e-09 ***
## attitude      3.5255     0.5674   6.214 4.12e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.32 on 164 degrees of freedom
## Multiple R-squared:  0.1906, Adjusted R-squared:  0.1856 
## F-statistic: 38.61 on 1 and 164 DF,  p-value: 4.119e-09

Model Validation

par(mfrow = c(2,2))

plot(my_model2, which=c(1,2,5))
<<<<<<< HEAD

=======

>>>>>>> df27db8ba4f834a4a8c88892f46e89a3e37c3c36

The three diagnostic model validation plots are shown above.The assumptions are

  • The errors are normally distributed
  • The errors are not correleted
  • The errors have constant variance,
  • The size of a given error doesn’t depend on the explanatory variables

Based on the above plots, we can conclude that the errors are normally distributed (clearly observed in q-q plot). Similarly, residual versus fitted model showed that the errors are not dependent on the attitude variable. Moreover, we can see that even two points (towards the right) have minor influence to the assumption in case of residual vs leverage model. All the models have adressed the outliers nicely. Thus, assumptions in all models are more or less valid.


Logistic regression

Data wrangling

library(GGally)
library(ggplot2)
library(ggpubr)
## Loading required package: magrittr
library(tidyr)
## 
## Attaching package: 'tidyr'
## The following object is masked from 'package:magrittr':
## 
##     extract
<<<<<<< HEAD
library(dplyr)
library(magrittr)
library(gmodels)
library(boot)
=======
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:GGally':
## 
##     nasa
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(magrittr)
library(gmodels)
>>>>>>> df27db8ba4f834a4a8c88892f46e89a3e37c3c36

In this section Data wrangling has been done for two data sets retrieved from UCI Machine Learning Repository. The R script used for data wrangling can be found here

Analysis

The data setes used in this analysis were retrieved from the UCI Machine Learning Repository.The data sets are intend to apprach student achievement in two secondary education Portuguese schools. The data was collected by using school reports and questionaires that includes data alttributes (student grades, demographic, social and school related features). Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por) (Cortez et al. 2008). For this analysis, the original dataset (mat and por) have been modified so that the two separate datasets have been joined. Only students who are answered the questionnaire in both portuguese classe are kept. The final data sets used in this analysis includes a total of 382 observation and 35 attributes.

<<<<<<< HEAD
modified_data= read.table("/home/melak/Open_data/IODS-project/data/modified_data.txt", sep="\t", header=TRUE)
=======
modified_data= read.table("/home/melak/Open_Data/IODS-project/data/modified_data.txt", sep="\t", header=TRUE)
>>>>>>> df27db8ba4f834a4a8c88892f46e89a3e37c3c36

colnames(modified_data)
##  [1] "school"     "sex"        "age"        "address"    "famsize"   
##  [6] "Pstatus"    "Medu"       "Fedu"       "Mjob"       "Fjob"      
## [11] "reason"     "nursery"    "internet"   "guardian"   "traveltime"
## [16] "studytime"  "failures"   "schoolsup"  "famsup"     "paid"      
## [21] "activities" "higher"     "romantic"   "famrel"     "freetime"  
## [26] "goout"      "Dalc"       "Walc"       "health"     "absences"  
## [31] "G1"         "G2"         "G3"         "alc_use"    "high_use"

Inorder to study the relationship between high/low alcohol consumption and variables. I chose four variables (“absences”, “sex”, “goout” and “studytime”). My hypothesis about how each variable is related to alcohol consumption shown below:

  • studytime: Students who dedicate quite much amount of time in thire study, they don’t have time to go out for drinking alchol (studytime and high/low alcohol consumption negatively correlated)

  • sex: Male students have higher alcohol consumption than female stduents (Male students and high/low alcohol consumption positively correlated)

  • goout: Those students going out with friends quite frequently consume high alchol than students don’t going out. (goout and high/low alcohol consumption positively correlated)

  • absences: Those students who consume more alachol don’t attend class for various reason (e.g hangover, attending party in class time ) (absences and high/low alcohol consumption positively correlated)


The distributions of my chosen variables and their relationships with alcohol consumption


A bar plot for demonstrating the distributions of my chosen variable

my_variables= c("absences", "sex", "goout", "studytime", "high_use")

my_variables_data <- select(modified_data, one_of(my_variables))

<<<<<<< HEAD
colnames(my_variables_data)
## [1] "absences"  "sex"       "goout"     "studytime" "high_use"
gather(my_variables_data) %>% ggplot(aes(value)) + facet_wrap("key", scales = "free") + geom_bar()
## Warning: attributes are not identical across measure variables;
## they will be dropped


A Box plot for demonstrating my chosen variables and their relationships with alcohol consumption

g1 <- ggplot(modified_data, aes(x = high_use, y = absences,col=sex))

p1=g1 + geom_boxplot() + ylab("absences") + ggtitle("Student absences by alcohol consumption and sex")

g2 <- ggplot(modified_data, aes(x = high_use, y = goout, col=sex))

p2=g2 + geom_boxplot() + ylab("going out with friends")  + ggtitle("Student who going out with friends by alcohol consumption and sex") 

g3 <- ggplot(modified_data, aes(x = high_use, y = studytime, col=sex))

p3=g3 + geom_boxplot() + ylab("studytime - weekly study time")  + ggtitle("Student who dedicate time to study by alcohol consumption and sex") 

ggarrange(p1, p2, p3 , labels = c("A", "B","C"), ncol = 2, nrow = 2)

======= gather(my_variables_data) %>% ggplot(aes(value)) + facet_wrap("key", scales = "free") + geom_bar()
## Warning: attributes are not identical across measure variables;
## they will be dropped


A Box plot for demonstrating my chosen variables and their relationships with alcohol consumption

g1 <- ggplot(modified_data, aes(x = high_use, y = absences,col=sex))

p1=g1 + geom_boxplot() + ylab("absences") + ggtitle("Student absences by alcohol consumption and sex")

g2 <- ggplot(modified_data, aes(x = high_use, y = goout, col=sex))

p2=g2 + geom_boxplot() + ylab("going out with friends")  + ggtitle("Student who going out with friends by alcohol consumption and sex") 

g3 <- ggplot(modified_data, aes(x = high_use, y = studytime, col=sex))

p3=g3 + geom_boxplot() + ylab("studytime - weekly study time")  + ggtitle("Student who dedicate time to study by alcohol consumption and sex") 

ggarrange(p1, p2, p3 , labels = c("A", "B"), ncol = 2, nrow = 2)

>>>>>>> df27db8ba4f834a4a8c88892f46e89a3e37c3c36
#sex_high_use <- CrossTable(my_variables_data$sex, my_variables_data$high_use)
#goout_high_use <- CrossTable(my_variables_data$goout, my_variables_data$high_use)
#goout_high_use <- CrossTable(my_variables_data$goout, my_variables_data$high_use)
#studytime_high_use <- CrossTable(my_variables_data$studytime, my_variables_data$high_use)


To statistically explore the relationship between my chosen variable and high/low alcohol consumption variable logistic regression moded was build using the R function glm(). To

<<<<<<< HEAD
m <- glm(high_use ~  absences + sex +  goout + studytime , data = modified_data, family = "binomial")
=======
m <- glm(high_use ~  absences + sex +  goout + studytime , data = modified_data, family = "binomial")
>>>>>>> df27db8ba4f834a4a8c88892f46e89a3e37c3c36


Inorder to be able to interpret the fitted model in detail, the summary, coefficients and confidence intervals of the fitted model are calculated as shown below.

summary(m)
## 
## Call:
## glm(formula = high_use ~ absences + sex + goout + studytime, 
##     family = "binomial", data = modified_data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9103  -0.7892  -0.5021   0.7574   2.6021  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -3.20483    0.59429  -5.393 6.94e-08 ***
## absences     0.07811    0.02244   3.481 0.000499 ***
## sexM         0.78173    0.26555   2.944 0.003242 ** 
## goout        0.72677    0.11994   6.059 1.37e-09 ***
## studytime   -0.42116    0.17058  -2.469 0.013551 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 465.68  on 381  degrees of freedom
## Residual deviance: 381.31  on 377  degrees of freedom
## AIC: 391.31
## 
## Number of Fisher Scoring iterations: 4
coef(m)
## (Intercept)    absences        sexM       goout   studytime 
## -3.20483157  0.07810746  0.78173290  0.72676824 -0.42116184
<<<<<<< HEAD
OR <- coef(m) %>% exp
CI <- confint(m) %>% exp
=======
OR <- coef(m) %>% exp
CI <- confint(m) %>% exp
>>>>>>> df27db8ba4f834a4a8c88892f46e89a3e37c3c36
## Waiting for profiling to be done...
cbind(OR,CI)
##                     OR      2.5 %    97.5 %
## (Intercept) 0.04056573 0.01222314 0.1263579
## absences    1.08123884 1.03577383 1.1324143
## sexM        2.18525582 1.30370562 3.7010763
## goout       2.06838526 1.64470314 2.6349716
## studytime   0.65628388 0.46510383 0.9099505


As shown above all the four variables are statistically significant. goout has the lowest p-value suggesting a strong association of going out with friends and high/low alcohol consumption. The positive coefficient for goout predictor suggests that all other variables being equal, going out with friends increase alchol consumption. In the logit model, the response variable is log odds: \(ln(odds) = ln(p/(1-p)) =\alpha+ \beta*x_1 + \beta*x_2 + … + \beta*x_n\). A unit increase in going out with friends increase the log odds by 0.72677. The variable absences and sex have also lowest p-vlaue 0.000499 and 0.003242, respectively. The positive coefficient for absences suggests that all other variables being equal, a unit increase in the absences increase alchol consumption. A unit increase in absences increase the log odds by 0.07811. sex is also the ohter signifcant value with p-value (0.003242) and suggesting association of the sex of the student with high\low alchol consumption. The positive coefficient for sex suggests that all other variables being equal, the male students are high likely alchol consumption. The male student inccrease the log odds by 0.78173. studytime is also the other siginficant variable and the negative coefficient for this variable suggests that all other variables being equal, a unit increase in studytime reduces the alchol consumption.A unit increase in studytime reduces the log odds by by 0.42116.

<<<<<<< HEAD

The ratio of expected “success” to “failures” defined as odds: p/(1-p). Odds are an alternative way of expressing probabilities. Higher odds corresponds to a higher probability of success when OR > 1. The exponents of the coefficients of a logistic regression model interpreted as odds ratios between a unit change (vs no change) in the corresponding explanatory variable. The exponents of goout predector coefficient is 2.06838526 and this suggests a unit change in the goout while all other the variables being equal, the odd ratio is 2.06838526. The exponents of sex predector coefficient is 2.18525582 and this suggests a unit change in the sex while all other the variables being equal, the odd ratio is 2.18525582. In simlar way, The exponents of absences predector coefficient is 1.08123884 and this suggests a unit change in the absences while all other the variables being equal, the odd ratio is 1.08123884. The exponents of studytime predector coefficient is 1.08123884 and this suggests a unit change in the studytime while all other the variables being equal, the odd ratio is 0.65628388. Hence odds ratio for the significant goout, sex and absences variables are 2.06838526, 2.1852558 and 1.08123884 respectively. It means that goout, sex and absences increase high alcohol consumption by the factor of around 2.07, 2.19 and 1.08. Whereas the odds ratio for studytime is 0.65628388 and it suggests that studytime decreases high alchol consumption.

Predictive power of the model
probabilities <- predict(m, type = "response")

# add the predicted probabilities to 'modified_data'

modified_data <- mutate(modified_data, probability = probabilities)

# use the probabilities to make a prediction of high_use

modified_data <- mutate(modified_data, prediction = probability>0.5)

table(high_use = modified_data$high_use, prediction = modified_data$prediction)
##         prediction
## high_use FALSE TRUE
##    FALSE   252   16
##    TRUE     65   49
# the logistic regression model m and dataset alc (with predictions) are available

# define a loss function (average prediction error)
loss_func <- function(class, prob) {
  n_wrong <- abs(class - prob) > 0.5
  mean(n_wrong)
}

# compute the average number of wrong predictions in the (training) data
loss_func(modified_data$high_use, modified_data$probability)
## [1] 0.2120419
# K-fold cross-validation

cv <- cv.glm(data = modified_data, cost = loss_func, glmfit = m, K = 10)

# average number of wrong predictions in the cross validation
cv$delta[1]
## [1] 0.2172775
======= >>>>>>> df27db8ba4f834a4a8c88892f46e89a3e37c3c36